Recently, there have been important developments in the field of artificial intelligence. In recent months, OpenAI launched its artificial intelligence tool ChatGPT. This step officially started a race in this field. Technology giants such as Microsoft and Google are working to get involved in this race. Finally , Google introduced Gemini, a rival to OpenAI’s language model GPT. So what are the features of Google Gemini? What are the differences compared to GPT 4? Here are all the details…
Google Gemini features
Gemini, introduced by Google at its recent event, will serve a wide range of uses. In this context, there are three different versions of the language model. These are Gemini Ultra , Gemini Pro and Gemini Nano respectively .
The lightest version , Gemini Nano , was made available to run on Android devices. Gemini Pro will power the company’s AI tools, including Google Bard , while the largest of them , Gemini Ultra , is designed for data center and enterprise applications.
Developed by the Google DeepMind team led by Google CEO Sundar Pichai and Co-Founder Demis Hassabis , Gemini can process different types of data, especially text, audio, images, video and software code. It can understand and rewrite codes written in the world’s most popular programming languages such as Python, Java, C++ and Go.
Started to be used for Google Bard!
Google Bard , which Google launched a few months ago as a ChatGPT rival , has been based on the LaMDA language model until now. The technology giant announced that the artificial intelligence tool will now be supported with Gemini Pro, and this change is available for everyone.
Trained with Google Tensor Processing Units
Google Tensor Processing Units (TPUs) were used to train the first version of Gemini. The technology giant also trained other artificial intelligence tools with this. Google Tensor Processing Units are also preferred by other artificial intelligence companies.
Is it better than GPT-4?
Google Gemini meets 30 of the 32 academic criteria for language models. Gemini Ultra became the first language model to outperform human experts on MMLU (massive multitasking language understanding) across 57 subjects, including mathematics, physics, history, law, medicine and ethics, with a score of 90 percent. In this regard, it is in a better position than GPT-4.
Skill Performance Comparison of Gemini Ultra and GPT-4;
Skill – Success Rate | Gemini Ultra | GPT-4 |
---|---|---|
General: | 90.0 percent | 86.4 percent |
Reasoning: | 83.6 percent | 83.1 percent |
Reading Understanding: | 82.4 percent | 80.9 percent |
Common Sense Judgment: | 87.8 percent | 95.3 percent |
Basic Arithmetic Operations: | 94.4 percent | 92.0 percent |
Challenging Math Problems: | 53.2 percent | 52.9 percent |
Python Code Generation: | 74.4 percent | 67.0 percent |
Multi-Module Capabilities Performance Comparison of Gemini and GPT-4V;
Talent – Success Rate | Explanation | Gemini | OpenAI GPT-4V |
---|---|---|---|
Picture: | Reasoning problems at the multidisciplinary college level. | 59.4 percent | 56.8 percent |
VQAv2: | Natural image understanding | 77.8 percent | 77.2 percent |
TextVQA: | OCR on natural images | 82.3 percent | 78.0 percent |
DocVQA: | document understanding | 90.9 percent | 88.4 percent |
Infographic VQA: | Understanding infographic | 80.3 percent | 75.1 percent |
MathVista: | Mathematical reasoning in visual contexts | 53.0 percent | 49.9 percent |
VATEX: | English video subtitling (CIDER) | 62.7 percent | 56.0 percent |
Perception Test MCQA: | Video question answering | 54.7 percent | 46.3 percent |
COVOST 2 (21 languages): | Automatic speech translation (BLEU score) | 40.1 percent | 29.1 percent |
FLEURS (62 languages): | Automatic speech recognition (based on word error rate, lower is better) | 7.6 percent | 17.6 percent |
Google and Alphabet CEO Sundar Pichai’s statement is as follows;
Every technology change is an opportunity to advance scientific discovery, accelerate human progress, and improve life. The shift we’re seeing now with AI will be much larger than the shift to mobile or the previous web, and will be the most profound we’ve seen in my lifetime.
Artificial intelligence can create opportunities for people everywhere, from the everyday to the extraordinary. It will bring new waves of innovation and economic progress and spark knowledge, learning, creativity and productivity on an unprecedented scale. That’s what excites me: the chance to make AI helpful to everyone, everywhere in the world.
As an AI-focused company approaching eight years old, the pace of progress is only accelerating: Millions of people are now able to do things they couldn’t do even a year ago, using generative AI in our products; from finding answers to more complex questions to collaborating and creating with new tools.
At the same time, developers are building new generative AI applications using our models and infrastructure, and start-ups and companies around the world are growing with our AI tools. This is incredible momentum, but we are only beginning to scratch the surface of what is possible.
We approach this work boldly and responsibly. This includes being ambitious in our research and collaborating with governments and experts to address risks as AI becomes more capable, as we pursue capabilities that will bring great benefits to people and society.
We continue to invest in the best tools, core models and infrastructure and are guided by these principles, bringing them to our products and others. Now, we take the next step in our journey with Gemini, our most capable and mainstream model, one that delivers superior performance on many leading scales.
Artificial intelligence has been the focus of my life’s work, as have many of my research colleagues. Since I started programming artificial intelligence for computer games at a young age, and for years as a neuroscience researcher trying to understand the way the brain works, I have believed that if we could build smarter machines, we could use them to benefit humanity.
It’s the promise of a responsible world by a responsible AI that continues to drive our work at Google DeepMind. We have long wanted to build a new generation of AI models inspired by the way humans understand and interact with the world.
An AI that doesn’t feel like an intelligent piece of software, but more like something useful and intuitive – an expert helper or assistant. Today, we’re taking one step closer to that vision and introducing Gemini, the most capable and mainstream model we’ve ever built.
Gemini is the result of large-scale collaborative efforts between different teams at Google, including our colleagues at Google Research. It is based as a multimodal AI with the ability to generalize and seamlessly understand, manipulate and combine different types of information, including text, code, audio, images and video.
Gemini is also our most flexible model, able to run efficiently on everything from data centers to mobile devices. State-of-the-art capabilities will significantly expand how developers and enterprise customers build and scale with AI.Sundar Pichai
{{user}} {{datetime}}
{{text}}